Rank | Count | Beginning |
---|---|---|
20474 | 5005 | Бұл |
82259 | 2789 | Сондай-ақ, |
12149 | 2122 | Ашылуы |
65223 | 2098 | Ол |
73222 | 1756 | Осы |
70914 | 1300 | Оның |
65492 | 1239 | Олар |
86124 | 1042 | Сонымен |
63379 | 853 | Өзен |
6662 | 837 | Ал |
85363 | 686 | Сондықтан |
17770 | 669 | Бірақ |
30644 | 629 | Егер |
66033 | 615 | Олардың |
1618 | 612 | 2006 |
44745 | 607 | Қазіргі |
32673 | 601 | Елді |
81237 | 500 | Сол |
42978 | 497 | Қазақ |
60810 | 492 | Мысалы, |
59480 | 442 | Мұндай |
43287 | 334 | Қазақстан |
6744 | 314 | Алайда |
34491 | 314 | Әр |
17749 | 294 | Бір |
57075 | 289 | Мемлекет |
4799 | 280 | Адам |
1266 | 263 | 2001 |
59280 | 259 | Мұнда |
43335 | 247 | Қазақстанда |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV